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ABSTRACT 


Analyzing students’ activities in their learning process is an 
issue that has received significant attention in the educa- 
tional data mining research field. Many approaches have 
been proposed, including the popular sequential pattern min- 
ing. However, the vast majority of the works do not focus 
on the time of occurrence of the events within the activities. 
This paper relies on the hypothesis that we can get a better 
understanding of students’ activities, as well as design more 
accurate models, if time is considered. With this in mind, 
we propose to study time-interval patterns. 

To highlight the benefits of managing time, we analyze the 
data collected about 113 first-year university students in- 
teracting with their LMS. Experiments reveal that frequent 
time-interval patterns are actually identified, which means 
that some students’ activities are regulated not only by the 
order of learning resources but also by time. In addition, 
the experiments emphasize that the sets of intervals highly 
influence the patterns mined and that the set of intervals 
that represents the human natural time (minute, hour, day, 
etc.) seems to be the most appropriate one to represent time 
gap between resources. 

Finally, we show that time-interval pattern mining brings 
additional information compared to sequential pattern min- 
ing. Indeed, not only the view of students’ possible future 
activities is less uncertain (in terms of learning resources and 
their temporal gap) but also, as soon as two students dif- 
fer in their time-intervals, this difference indicates that their 
following activities are likely to diverge. 


Keywords 
Students behavioral patterns, time-interval pattern mining, 
interval granularities, sequential pattern mining. 


1. INTRODUCTION 


The wealth of data that can be collected from a Learning 
Management System (LMS), mainly the logs of students’ in- 
teractions with learning resources, provide opportunities to 
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get a more comprehensive understanding of students learn- 
ing process: point out engaged or at-risk students, identify 
the most commonly studied or the most difficult resources, 
highlight recurrent students’ activities, etc. In addition to 
this thorough understanding, inferences or decisions can be 
drawn: estimate students outcome, predict students future 
behavior (including dropout), personalize learning by pro- 
viding students with information or recommendations, etc. 
To carry out such understanding, inference or decision, data 
mining methods have been applied. Pattern mining, that 
discovers frequent patterns of events in data, is one of these 
methods and is also used in a large number of application 
fields. Sequential Pattern Mining (SPM) consists of dis- 
covering patterns when data is sequential in nature. These 
patterns, named sequential patterns, are frequent ordered 
sequences of events. 

In the educational field, a sequential pattern often represents 
a recurrent sequence of learning resources, that we call an 
activity [30, 5]. 


The time of occurrence of events is often part of the data 
to be mined. However, in most of the cases, the patterns 
mined do not contain temporal information. Nevertheless, 
the literature has introduced different ways of including such 
information in patterns. We can, for example, cite tempo- 
ral patterns, made of events that are associated with their 
time of occurrence [36], their duration [8], or the time gap 
between the events. In [9], gaps between events are grouped 
into intervals, resulting in time-interval sequential patterns. 
Since a time-interval pattern conveys more information than 
its corresponding sequential pattern, they are still the focus 
of research works [33]. In the rest of the paper, time-interval 
patterns will be referred to as ti-patterns and sequential pat- 
terns to as s-patterns. 


We think that ti-patterns are adequate to represent stu- 
dents’ activities. Indeed, it is rare that two students per- 
form exactly the same activities, in both learning resources 
and time, even though they share underlying sequential ac- 
tivities. To the best of our knowledge, no work in the field 
of educational data mining has focused on the mining of ti- 
patterns. 

In this work, we thus rely on the hypothesis that mining 
ti-patterns will contribute to a better view and understand- 
ing of students’ learning activities. These patterns do not 
only indicate in which order students interact with learning 
resources, but provide also information about the temporal 
relationship between these resources. For example, let us 
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consider that students tend to interact sequentially with two 
resources, each of them being lecture slides. The sequence 
of both resources represents a sequential activity. Suppose 
that mining ti-patterns highlights that the time gap between 
both resources tends to be less than 1 minute for some stu- 
dents and between 2 and 4 hours for others. We can thus 
deduce beyond this sequential activity that there are two 
typical behaviors. 

To support our hypothesis, we will conduct a study to eval- 
uate if ti-patterns can be actually identified from students’ 
activity data and evaluate to what extent ti-patterns provide 
additional information about students’ activities. 


In the following sections, we will first present an overview 
of related works on sequential and temporal pattern mining 
(Section 2). We then present the methodology we adopt to 
support the hypothesis that we draw (Section 3). Section 
4 details the experiments we conduct on a real dataset and 
presents some ti-patterns. The last sections discuss the re- 
sults (Section 5), then conclude the work and present our 
expected future work (Section 6). 


2. LITERATURE REVIEW 
2.1 Sequential Pattern Mining (SPM) 


Sequential Pattern Mining is a popular task in Data Mining, 
introduced by Agrawal and Srikant in [1]. SPM aims to dis- 
cover frequent sequential patterns in sequential databases. 
A sequential database D is a set of tuples D = {(sid;, di)}, 
where sid; is the unique identifier of a sequence, and d; an 
input sequence. A sequence is an ordered list of events: 
s = (Ff, Fy...E,), with E; € E the set of events. To under- 
stand what a frequent sequential pattern is, let us first define 
what a sub-sequence is. a = (f1...E,) is a sub-sequence 
of 8 = (E,...E),) if: 
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We also say that 6 is a super-sequence of a, or that it con- 
tains a. Let us now define what the support of a sequence 
is. The support of a, noted supp(a), is the number of se- 
quences in the sequential database D that contain a. Based 
on both definitions, we can now define that a sub-sequence 
a is a frequent sequential pattern, if supp(a) > 6, for a de- 
fined minimum support threshold 6. We define SP as the 
set of frequent sequential patterns. 


Many SPM algorithms have been proposed in the litera- 
ture. The most commonly cited ones are GSP [32], PrefixS- 
pan [26], SPADE [37]. All these algorithms use the ”apri- 
ori property”: "If a sequence s is not frequent, then non of 
the super-sequences of s is frequent.”. Thus, when one pat- 
tern is infrequent, it is not extended. Algorithms can be 
divided into two main approaches. Apriori-like algorithms 
(also called breadth-first search algorithms), such as Gener- 
alized Sequential Pattern Mining (GSP) algorithm [32], are 
the first algorithms that have been proposed. However, these 
algorithms suffer from scalability problems, mainly due to 
memory requirements. Depth-first search algorithms, which 
include pattern-growth algorithms, do not suffer from mem- 
ory complexity, which explains their popularity. 

For a couple of years, the most common SPM algorithm is 
the Prefix-Projected Sequential Pattern Growth (PrefixSpan) 
algorithm [26], which is a pattern-growth algorithm, that 


relies on projected databases. Projected databases gener- 
ally reduce the research space as the size of the projected 
databases decreases at each iteration. However, the main 
cost is linked to the generation of these projected databases. 
The pseudo-code of PrefizSpan is presented in Algorithm 1. 


Algorithm 1 PrefixSpan (a,l,D) 


Inputs: 
qa: a sequential pattern and / its length. 
D: a sequential database, or a projected database. 
Outputs: 
SP: the set of all frequent sequential patterns. 
Method: 
Scan S' to find all frequent items 0. 
for all b do 
add a’ = (ab) to SP as a new sequential pattern. 
10: end for 
11: for all a’ do 
12: create the a’-projected database D|), 
13: call PrefixSpan(a’,! + 1,D|.q/) 
14: end for 


In the works mentioned below, both the database and the 
patterns are sequential. However, in some cases, the database 
can be temporal, i.e. contain information about the time of 
occurrence of the events. In these cases a sequence is de- 
fined as: s = ((t1, F1), (t2, E2),... , (tr, En)). where (ti, Ei) 
represents an event E; and its time of occurrence t;. 


When sequential patterns are mined from these databases, 
time can be either used as an information or order between 
events, such as in SPADE [37]. The time of appearance of 
events can also be used as a constraint. For example, in [18] 
the authors consider that when two consecutive items in a 
sequence are separated by a time gap bigger than a prede- 
fined threshold, they are temporally too distant to represent 
an association that makes sense. In the same context, [31] 
discards uninteresting patterns by introducing an interval 
constraint between items. 


2.2 Sequential Patterns Mining in EDM 
Sequential Pattern Mining has been extensively used in Ed- 
ucational Data Mining. They are mainly used to identify 
frequent patterns of students’ activities [16, 28], including 
those that maximize the student learning performance [10]. 
In [21] SPM is used to study the differences in students’ pro- 
ductive and unproductive learning behaviors and thus iden- 
tify high versus low performing students. A similar objective 
has been studied on group work systems to understand the 
success factors in groups behavior [27, 25]. 

SPM is also used to detect learning problems early, such as 
in [20] where frequent sequential patterns and flag interac- 
tion sequences that are indicative of problems are mined. 


One step further, SPM can act as a first step in decision 
making. In [7], the prerequisite structure of skills is find 
out, by identifying relations between variables from data. 
The algorithms developed in [28, 34, 11] provide students 
with personalized recommendations of learning resources ac- 
cording to their current activity or their learning style. 

A complete view of various approaches used in educational 
data mining is presented in [2]. 
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2.3 Temporal Pattern Mining 

Temporal information appears to be fundamental in many 
contexts, hence the number of works interested in the min- 
ing of patterns that contain temporal information. Here 
again, time information can be used in several ways and for 
different goals: gaps, duration, intervals, etc. 


Time information is often used as a gap between events of 
a pattern. For example in [36], the author considers that 
each occurrence of a sequential pattern may have different, 
but close, temporal elements. So, they propose to associate 
each pair of events of a pattern with a minimal, a mean 
and a maximal gap values between these events. The re- 
sulting model is made of sequential patterns enriched with 
temporal information, called delta patterns. Similarly, [15] 
proposes to add temporal information to each pair of events 
in a sequential pattern. This information, referred to as 
an annotation, represents a typical gap value between each 
pair of events of a pattern. In this work, the acceptance of 
the variation around this typical gap value is automatically 
evaluated. At the opposite of the previous works, [35] pre- 
defines a maximal gap value between events of a pattern, 
which results in temporal patterns called chronicles. [22] 
introduces an even more constraining frame, the exact gap 
interval value is imposed. This approach results in a de- 
crease in the support of each pattern. Thus, the number of 
extracted patterns decreases. 

In addition to the gap value, [17] exploits the duration of 
events. Each element of a pattern is composed of the event, 
associated with its begin and end timestamps. They propose 
an Apriori-like algorithm, that uses a hypercube representa- 
tion of temporal sequences. 

More recently, [13] introduces an Apriori-like temporal pat- 
tern mining algorithm on multi-modal data streams. At 
the opposite of the previous works, they do not only use the 
time gap between events (that represents the duration of the 
event), but also use the exact starting time of each event. 


In line with the works presented above, [6] also manages gap 
values between events, that are grouped into intervals. At 
the opposite of other works, the intervals of gap values are 
predefined, and form ”time-interval sequential patterns”. A 
time-interval sequence is defined as: 


a= (B17 E272 ove: .71-1 £1) 


where FE; € E is the set of events for 1 < i < I and 
7 € TI the set of time-intervals. The sequence a is a time- 
interval pattern if supp(a) > 6. We note TP the set of 
frequent time-interval patterns of a database D. In their 
article, the authors propose two algorithms called -Apriori 
and I-prefizSpan, and results show that J-PrefirSpan out- 
performs [-Apriori both in computing time and scalability. 
The pseudo-code of the [-PrefizSpan algorithm is presented 
in Section 2. 


A few years later, [19] pointed out that most algorithms of 
the literature use time information only as a time constraint 
or to represent the time-interval between successive items 
[9]. The novelty of this work is that not only the delay 
between successive items is taken into account, but also be- 
tween distant items. The "multi time-interval (MI) sequen- 
tial pattern” models the time-intervals between all pairs of 
items within a pattern. Two algorithms have been proposed, 


Algorithm 2 I-PrefixSpan (a,l,D) 

Inputs: 

a= (£171...71-1E£1): a temporal pattern. 

l: the length of a. 

D: a sequential database, or a projected database. 

Outputs: 

TP: the set of all frequent temporal patterns. 

Method: 

Scan D to find each frequent pair (7, 41), where 7] € 

TI is the gap interval between items F;-1 and £41. 

9: for all (71, Ei41) do 

10: adda’ =(F,...71-1Fi71 E141) to TP, as a new tem- 
poral pattern. 

11: end for 

12: for all a’ do 

13: create the a’-projected database D|/, 

14: call I-PrefixSpan(a’,l + 1,D|q’) 

15: end for 


MI-Apriori and ML-prefixSpan, that are highly similar to the 
I-PrefitSPan and I-Apriori algorithms. 


Discovering time-interval patterns has attracted consider- 
able efforts, due to its widespread applications. However, 
several challenges remain, such as the definition of the ad- 
equate set of intervals (whether manual or automatic), in- 
cluding the problem of the granularity of the intervals. 


2.4 Temporal Granularities 

As soon as intervals are introduced, an issue arises: how to 
choose these intervals? 

[3] proposes to manage different temporal granularities. An 
algorithm composed of Timed Automata with Granulari- 
ties (TAGs), associated with heuristics is proposed. TAGs 
test whether a candidate time pattern appears frequently in 
a time sequence. The heuristic allows to reduce the num- 
ber of candidates. [29] focuses on mining periodic patterns, 
where interesting periods cannot be defined in advance. Two 
temporal granules are proposed: a fine-grained granule for 
hourly periods and a coarse-grained granule for daily peri- 
ods. The time distribution of different time granularities is 
then estimated by using a combination of Gaussian distri- 
bution. 


2.5 Temporal Data Mining in EDM 

To the best of our knowledge, little use has been made of 
Temporal Pattern Mining in the EDM field. [23] takes time 
into account by evaluating the rate at which students change 
the learning resources of interest. They progressively im- 
prove “when” resources have to be recommended to the stu- 
dent. In a learning context, where students can choose both 
which and when courses and exams to take, the research 
work presented in [4] uses time information that corresponds 
either to the "semester in which the exam was taken” or to 
the "delay with which it was taken”. Using this time infor- 
mation, they then study the course and exam schedule that 
the students take and understand better students’ behav- 
iors. Using clustering and comparison, they are then able 
to suggest improvements to the scheduling of courses and 
exams of students. 
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3. DEFINITIONS AND METHODOLOGY 


The previous literature review highlights that time-intervals 
are mainly adopted to model temporal patterns. The algo- 
rithm proposed in [6], [-PrefizSpan, has the main advantage 
to consider intervals as a core element of the patterns and 
the mining process. Intervals are considered as a constraint 
about the patterns, not as supplementary information about 
the patterns. It is the main reason why we choose to adopt 
this algorithm in our work. 

We start by introducing definitions that will be used in the 
following methodology and in the experiments. 


3.1 Definitions 

Let p = (Fim E2...T-1En) and p’ = (Eiri E)...7),_1E!,) 
be two ti-patterns and s = (Ej/E... Ej’) be a s-pattern, 
with n (resp. m and 1), the length of the pattern p (resp. 
p’ and s). Given these patterns, we put the following defini- 
tions. Recall that TP is the set of frequent ti-patterns and 
SP the set of s-patterns. 


DEFINITION 3.1. t#-form of an s-pattern 
p is a ti-form of s, denoted by isform(s,p) if and only if: 
(n=m) A (Ei = E's), Vi€ [1: nl]. 
ti-form(s) is the set of ti-forms, in TP, of s. 


DEFINITION 3.2. s-form of a ti-pattern 
s is a s-form of p if and only if: (n = s) \ (Ei = E's), 
Vi € [1 : nj. s-form(p) is the (unique) frequent s-form, in 
SP, of p. 
s-form(P) is the set of frequent s-forms (in SP) of the set 
of ti-patterns p € P. 


DEFINITION 3.3. s-equivalence of ti-patterns 
p and p’ are s-equivalent, denoted s-eq(p,p’) if and only if: 
(n=m)A (Ei = E',), Vi € [1 : nl]. 
In other words, s-form(p)=s-form(p’ ). 


DEFINITION 3.4. Prefix of a ti-pattern 
p’ is a prefix of p if and only if: 
(m —& n) N (Ey => E’;) N (tv = Ti), Vi € [1 ml. 


DEFINITION 3.5. Extension of a pattern 
p’ is an extension of p if p is a prefix of p'. We note ext(p) 
the set of extensions of p that belong to TP. 
A similar definition can be put for s-patterns. 


DEFINITION 3.6. Extended part of a pattern 
Let p’ be an extension of p. The extended part of p, with 
respect to p’, is the pattern p”, where concat(p,p”) = p’. 
Thus, p” = (Ente Ens ase Tm—1Em). 
We note extPart(p) the set of extended parts of p, t.e. the 
set of patterns that, when concatenated with p, result in a 
pattern that belongs to TP. 
A similar definition can be given for s-patterns. 
Example: Let p = (e1I,e9), and p’ = (eileol2e1) be two 
ti-patterns. p” = (egIge1) is an extended part of p. 


DEFINITION 3.7. Pseudo-equivalence of ti-patterns 
p and p’ are said to be pseudo-equivalent, if and only if: 
s-eq(p,p') A (tm #T) A(T = 7), Vi € [Li n—I1]., te. they 
differ only in their last time-interval. 


3.2 Methodology 


To support our hypothesis and identify the actual value of a 
ti-pattern model, we define a methodology. More precisely, 
this methodology aims at identifying if there actually are 
temporal regularities between students’ activities, if man- 
aging temporal activities allows to have a better view of 
students’ future activities, and concretely what type of ac- 
tivities are mined. Recall that mining ti-patterns is quite 
new in educational data mining. 


We intend to mine ti-patterns in a temporal database D, 
which is a database made up of temporal sequences. A 
temporal sequence is an ordered list of events (concretely 
a list of resources students interacted with) and their asso- 
ciated timestamp. Each temporal sequence represents one 
student’s temporal activities and each student is represented 
by a unique (and long) sequence. 

Our methodology relies on four steps, described hereafter. 


3.2.1 Determining the set of time-intervals 

Recall that although timestamps are discrete values, their 
precision is so high that relying on time-point (or gap) pat- 
terns will probably only lead to infrequent patterns. For ex- 
ample, two sequences that only differ by one second: ((0, £1) 
(3, E2)) and ((0, £1)(4, E2)) will correspond to two different 
patterns. Grouping gaps to form ti-patterns, will increase 
the support of patterns. In addition, if the intervals are ap- 
propriate, the loss of precision about temporal activities will 
be limited. 

So, before assessing the relevance of mining ti-patterns, we 
have to choose the adequate set of time-intervals. Indeed, 
this set influences the information conveyed. 


Let TI = {lo,h,...J4} be a set of time-intervals, where 
I; = |gapmin,;; gapmaz,| is an interval that contains all gap 
values between gapmin; and gapmaz,;. Notice that the set 
of intervals should represent a continuum of gap values from 
gapming to gapmaxt. 


We propose to evaluate the quality of a set of intervals TJ 
with 8 criteria: 

The fitting ratio. It is the ratio between the number of 
non-empty intervals and the total number of intervals. A 
non-empty interval is an interval that is part of frequent 
patterns. The higher the ratio, the better the set of inter- 
vals, as the number of ”useless” intervals is low. 

The number of intervals. On the one hand, the more 
intervals, the higher the potential of the model. Notice that 
when TI = {Jo} = [0 : +00], it comes down to PrefixSpan. 
On the other hand, using too many intervals increases the 
complexity of the model. In addition, as there are many 
intervals, the ti-patterns discovered will probably be infre- 
quent. Thus, a good set is a set that has an in-between 
number of patterns. 

The horizon. It is represented by TJ, the upper bound of 
the last interval (the maximal time value of the set of inter- 
vals). The larger the horizon, the more complete the model, 
as it is able to represent long-term recurrences. 


From our point of view, the best set of intervals is the one 
that maximizes the fitting ratio while having a large horizon, 
with a limited number of intervals. 
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3.2.2. Comparing sets of s-patterns and ti-patterns 
After having fixed the set of intervals, the set TP of ti- 
patterns can be mined. In this second step, we aim at com- 
paring the set TP with the set SP set of s-patterns, and 
propose some measures to perform this comparison. 


First of all, we propose to study the number of patterns, and 
their average length, to get a coarse-grained view of the set 
of patterns. Of course, this measure cannot be used alone, 
as the goal is definitely not to mine the highest number of 
patterns. 

Second, we study the correspondence between both sets of 
patterns. Let us start by noticing that the number of ti- 
patterns cannot be deduced (not even approximately) from 
the number of s-patterns. A brief explanation follows. 


Let s be a frequent s-pattern and ti-cand(s) = {ts1,tse,... 
,ts,} the set of the candidate-ti-forms of s. Note that ts; 
may be infrequent. Two cases arise: 
— |ti-cand(s)| = 1. This case occurs when all the occur- 
rences of s have the same candidate ti-form ts. Here, supp(ts) 
supp(s), thus ts is also frequent. The corresponding set of 
s-patterns is noted S'. 
— |ti-cand(s)| > 1. This case occurs when some occur- 
rences of s have different candidate ti-forms. As a con- 
sequence, Vi, (supp(ts:) < supp(s))A (Sc*_, supp(tsi) = 
supp(s)). Here, come three possibilities: 
- A ts;, supp(ts;) > 6: there exists no frequent ti-form of s, 
thus the number of frequent patterns decreases. The asso- 
ciated set of patterns is noted S°. 
-d! ts;, supp(ts:) > 6, thus: Vj|{(1 < 7 < |ti-seq(s)|)A GF 
i)}, supp(ts;) < 6. In this case, there exists a unique fre- 
quent ti-form of s, the number of patterns remains stable. 
- A(t, 7), @ #7) A (supp(tsi) > 4) A (supp(ts;) > 6). In this 
case, there exist several frequent ti-forms of s, the number of 
patterns increases. The set of patterns associated with both 
last cases is noted S'*. Based on this, we first introduce the 
pattern loss measure, that represents the ratio of s-patterns 
that have no ti-form in TP (s € S°). 
|SP|—| U_s-form(p)| 
pEeTP 


|SP| 


pLoss(SP) = (1) 


To complete the pattern loss measure, we define the sup- 
port loss measure, which applies for any s-pattern that has 
at least one frequent ti-pattern (s € S'*). The support loss 
measure evaluates the proportion of ”lost” occurrences of s, 
i.e. that have no correspondence in TP. 

Let s be a s-pattern and P = {p1,p2,--- ,pr} be a set of 
ti-patterns, where is form(s, pi), Vpi € P. The support loss 
of s is defined in equation (2). 


supp(s) — supp” (P) (2) 


sLoss(s) = supple) 


where supp*(-) is the support of a set of patterns, defined in 
equation (3). 


supp*(P) =| J Seqid(p)| < > |supp(p)| (3) 


where Seq_id(p) is the set of sequence ids in D, where p is 
a subsequence. We can see that the support of P is not 


defined as the sum of the supports of the patterns in P. To 
explain this, let us consider P = {p1,p2}, with pi and po 
two s-equivalent ti-patterns. 


By definition, the s-form of pi (which is the same as the 
s-form of p2) occurs at most once in each sequence of D. 
Similarly, p; and p2 occur at most once in each sequence, 
but both can occur in the same sequence. As a consequence, 
the support of P may be lower than the sum of the supports 
of p, and pg. 


The support loss defined above applies for a s-pattern. If 
the support loss has to be evaluated on a set of patterns, the 
average support loss and the associated standard deviation 
can be used. 


3.2.3 Evaluating the impact of time on the set of pos- 


sible future activities of students 

In the following third and fourth steps, we aim to evaluate 
the benefit brought by time in patterns (through ¢i-patterns) 
about the possible future activities of students. To perform 
this evaluation, we adopt a two-stage approach. 

Let p be a ti-pattern and extPart(p) the set of extended 
parts of p (see Def. 3.6). From the educational point of 
view, the set of extended parts of a ti-pattern p represents 
the ti-activities that students frequently do after p. 


In this third step, we aim at discovering if managing time 
allows to reduce the uncertainty about the future activities 
of students. We compare the set of extended parts of s- 
patterns and the set of extended parts of their ti-forms. 

To conduct this comparison, we propose to use the well- 
known entropy measure. The entropy of a pattern p repre- 
sents the ’degree of disorder” of the set of its extended parts. 
From the educational view, given an activity performed by 
students, the entropy measures the uncertainty of its follow- 
ing activities. The higher the entropy, the more uncertain 
the following activities. Relying on the entropy is not new 
in the educational field [38]. Equation (4) presents the way 
the entropy of a ti-pattern p is evaluated. 


m 


Ent(p) = — )~ prob(p;)logs(prob(p;)), (4) 
j=l 

with prob(p;) = ERT 
tended parts of p. The same equation stands for s-patterns. 
Given a s-pattern s, we thus propose to evaluate the ben- 
efit of considering time-intervals in this pattern, by evalu- 
ating the entropy loss (see Equation 5). Entropy loss of 
an s-pattern s considers the entropy of s (Ent(s)) and the 
maximum entropy of its ti-form. 


and p; is one of the m ex- 


Ent(s) — MAX peti-form(s) {Ent(p)} 
Ent(s) (5) 


eLoss(s) = 


Several cases may arise. First, eLoss = 1. This represents 
the best case: each of the ti-forms of s has exactly one ex- 
tension. This means that when managing time in patterns, 
the future activities are totally certain. 

Second, eLoss = 0.0. This case represents one of the worst 
cases: at least one ti-form of s has the same entropy as s. 
Here, we cannot say that managing time makes the possible 
future activity less uncertain. 
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Last, eLoss < 0.0. This case represents the other worst case: 
all the ti-form of s has an entropy higher than s. In this case, 
considering time decreases the quality of the model. Notice 
here that the term Loss is a misnomer as it may theoreti- 
cally be < 0.0. However, this term has been chosen to be 
coherent with previous measures. 

As a consequence, the higher the entropy loss ratio the more 
managing time in patterns contributes to better estimate 
students’ future activities. 


3.2.4 Evaluating the impact of a specific time-interval 


on students’ future activities 

This fourth and last step is dedicated to the evaluation of 
the impact of a specific time-interval of a ti-pattern on its 
extended parts. More precisely, we are interested in the im- 
pact of the last time-interval of a pattern. We focus on the 
following situation: given two pseudo-equivalent ti-patterns 
(cf., 3.7), to what extent do their set of extended parts dif- 
fer? 

This evaluation allows to study to what extent two students, 
who perform the same temporal activity, except about the 
time of their last activity, do have identical future activities. 
In other words, is a temporal difference between two activi- 
ties an indicator of activities that are beginning to diverge? 


To perform this evaluation, we first evaluate the proportion 
of identical ti-patterns between pairs of sets of extended 
parts, as defined in Equation (6). 


IPNQ| 
d(P,Q)€PQ |PUQ| (6) 


|PQ| 


idExt(PQ) = 


with PQ = {(extPart(p), ext Part(q)) }|psd-eq(p, q) the pairs 
of sets of extended parts of all pseudo-equivalent pairs of ti- 
patterns. The higher this proportion, the lower the impact 
of the last time-interval. 


Second, we rely on the proportion of s-equivalent extended 
parts. This measure also evaluates the impact of the last 
time-interval on the set of extended parts, but by considering 
only their sequential nature. The proportion of s-equivalent 
extended parts is defined in Equation (7). 


3 | s-form(P) M_ s-form(Q)| 
(P,Q)E€PQ |s-form(P) U s-form(Q)| (7) 


| s-form(P, Q)| 

This proportion represents if students tend to share their fol- 
lowing sequential activities, even though they differ in their 
last time-interval. Here also, the higher this proportion, the 
lower the impact of the time-interval. 


sidExt(PQ) = 


Notice that for reasons of readability, s-form(-) is used here 
to represent the sequential form of a set of ti-patterns and 
a set of pairs of ti-patterns. 


4. EXPERIMENTS 


We apply the methodology described in the previous section 
to evaluate to what extent mining ti-patterns increases the 
knowledge about students’ activities. We first present the 
dataset on which the experiments are conducted, then use 
the 4 steps of the methodology and draw conclusions for each 
of them. Finally, some mined ti-patterns are displayed. 


4.1 Dataset overview and implementation 

We collected data from 113 first-year university students, 
enrolled in a Mathematics and Computer Science Bachelor 
program and who interact with learning resources on their 
LMS. We focus on one specific course: algorithms and pro- 
gramming from the Fall semester in 2018. This course is 
a core course of this program. Diverse online materials are 
available: slides, exercises for lab sessions, tests, etc. 

Most of the students own a personal computer, so they can 
access the course both during teaching hours (lectures or lab 
sessions) and after official teaching hours. 

The set of events F is made of 35 learning resources, that 
students can consult. About 50% of these resources are stud- 
ied during the teaching hours (lectures or lab). The dataset 
is made up of about 6,300 actions and each student sees on 
average 56 resources. The dataset spans almost one year, as 
it includes actions performed not only during the teaching 
period but also during revisions for the final examination 
and actions conducted for the retake examination (for the 
subset of students who failed the final examination). 


In the experiments conducted, we use a relative minimum 
support 6 = 0.1. Two algorithms are studied: PrefixSpan, 
to mine sequential patterns and I-PrefizSpan, to mine ti- 
patterns. The source code used for [-PrefixSpan algorithm 
is the one available in [12] (we have slightly adapted the 
code to our needs). The source code used for the classical 
PrefizSpan algorithm is the one proposed by Gao [14]. 


4.2 Determining the set of time-intervals 

We propose to study two types of intervals: Linear intervals, 
where each interval has an equal duration, and granular in- 
tervals, where the duration of intervals grows with the gap 
value. 


Table 1 presents various sets of intervals studied. For each 
of them, the number of intervals, the maximal horizon, the 
fitting of the set, the frequency of each frequent interval, as 
well as the number of frequent patterns, are displayed. To 
avoid an artificially high fitting value, we consider that an 
interval is frequent if its frequency is no less than 10. The 
frequency of an interval is evaluated as the number of times 
the interval is used in the frequent patterns. 


Before going into the details of the analysis of the set of 
intervals, we would like to mention that the sets do not all 
have the same number of intervals, so these values in Table 
1 are not directly comparable. In addition, two contiguous 
granular intervals represent a totally different duration (for 
example up to 1 hour and up to 1 day), the frequencies are 
therefore not comparable. Last, notice that the total number 
of patterns in one set of intervals cannot be explained by 
the number of patterns of another set. Let us for example 
consider two sets of intervals and their associated number 
of patterns. Suppose that the first interval has an average 
duration twice longer than the second one. A pattern that 
is frequent in the first set may correspond to either two 
frequent patterns in the second set, or only one frequent 
pattern, or no frequent pattern at all (see section 3.2.2). 


Let us first consider the three sets of linear intervals. For 
the two first sets (30 min and 1 hour), the fitting measure 
is quite low: 8%, which means that the vast majority of in- 
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Type Duration || Number of intervals | Horizon | Fitting Used intervals # 
& associated frequency patterns 
Linear 30 min 25 12h. Io : 350,000 ; Iza : 1,770, 000 550, 000 
Linear 1 hour 25 12h. Io : 549, 380 ; Iaa : 1,843, 660 356, 811 
Io : 90,976 ; I, : 42; Ip: 25; 
Ig :54; I4:119; Is: 36; 
Linear 1 day 25 24d. 72% Fe eDO ate noe dee LG 37, 764 


Iy1:14; the: 44; Tg: 80; 
Iya:173; Th7: 14; Tig: 13 ; 
Ioq : 36; In1 : 45; Loa : 33,298 


Io : 17,739; In: 79; Ig: 82; Ip : 269; 


Ip = [0 sec. ; 1 mn.| 


Granular expon. W106 a -Jomn,| 8mt. 56% Tho : 4,278 ; Ii1 : 5,126; lhe : 6,408 ; 15, 754 
iy ae Th3 : 3,693; Tia: 1,159 
6 Ip(sec): 7,706 ; I1(min) : 10,551 ; 
Granular | human ly. 100% In(hour) : 1,615 ; I3(day) : 30, 925; 51,025 


I,(week) : 68,614 ; I3(month) : 22, 479 


Table 1: Fitting, examples of intervals and number of patterns for several sets of intervals 


tervals are not found in frequent patterns. For example, in 
”30 min”, only the first interval (between 0 and 30 minutes) 
and the last interval (more than 12 hours) are not empty. 
We can conclude that both sets of intervals are not good 
candidates. Caution must be exercised in interpreting this 
result. It might mean that students do not regularly switch 
from one resource to another, with a time gap between 30 
minutes and 12 hours. It can also mean that the 30 min. 
time-interval is not relevant. Despite the lack of relevance of 
these intervals, the number of patterns discovered is impor- 
tant. As only two interval patterns are used, we can consider 
that I-PrefixSpan behaves here almost as PrefixSpan. 

The fitting value of the ”1 day” set is quite larger: 72%, 
which means that most of the 25 intervals are frequent. 
However, the total number of frequent patterns in this set 
is highly decreased, compared to the 30 min” and ”1h” sets 
(by about 10 times). In addition, many interval frequencies 
are not so high, some of them being close to the minimal 
threshold, except the first and last one. This tends to mean 
that many intervals are not that representative of the data. 
Moreover, although the number of intervals is quite large 
(25), the maximal horizon represented by this set remains 
limited (all together, except the last one, represent a horizon 
of smaller than a month). Recall that the dataset spans al- 
most one year. Obviously, the horizon can be extended, but 
it will be at the cost of an even larger number of intervals, 
as well as an increase in the space and computation time. 
These results tend to suggest that the set of intervals should 
contain small intervals for close events (such as suggested by 
the frequency of Jo in the 30 min set), and larger intervals 
for furthest gaps (such as suggested by the frequency of I24 
in the 1 day set). Thus, a granular set of intervals should 
better fit the dataset. 


We propose to study now two sets of granular intervals. In 
the first set, the duration of intervals grows exponentially: 
the duration of an interval is twice larger than the duration 
of the preceding interval. The fitting of this set is greater 
than for the two first ones, but smaller than the third one. 
Nevertheless, the horizon is larger than for all the previ- 
ous ones (about 4 months), and the number of intervals is 
decreased. The empty intervals (from [2 to I7) tend to rep- 
resent a gap between 20 min and 10 hours 40 min. 


The second set of granular intervals is referred to as ”hu- 
man”, the intervals are designed to represent the human 
natural time: minute, hour, day, week, etc. This set of 
intervals has a maximal fitting (100%). At the opposite of 
the ”1 day” intervals, that has the highest fitting value till 
then, the frequency of each interval is quite large (greater 
than 1,600) and the number of intervals is reduced (only 6 
intervals). Besides, the total number of patterns is larger 
than both the ”1 day” and the ”exponential” sets. 


All these elements contribute to consider the human” set as 
the best set of intervals. In this set, time is represented by 
the {minute, hour, day, week, month, year} intervals. This 
set has a maximal fitting (100%), covers a large horizon (till 
a year, which corresponds to the span of the dataset), with a 
limited number of intervals (6 intervals) and provides a quite 
large number of frequent temporal patterns. Therefore, in 
the following experiments, this set of intervals will be used. 


Given these elements, we would like to highlight that this 
set of intervals intrinsically represents the classical rhythm 
of courses, for example one lecture (or one lab session) is 
planned each week. The human set of intervals thus allows 
to mine patterns that represent natural students temporal 
activities: some students tend to work immediately following 
a lab session (or a lecture) represented by Jp or [,; other 
students wait for some hours in the same 24h, and others 
work during the week, or even the week after (before the 
next session) represented by I4. It is typically the type of 
information that we expect to get when we aim at modeling 
students’ activities. 


4.3 Comparing s-patterns and ti-patterns 

This second experiment aims at comparing sets of s-patterns 
and ti-patterns. Table 2 presents both sets of patterns, 
associated with measures introduced in the methodology. 
Let us first focus on the number of frequent patterns (line 
1). The total number of frequent ti-patterns is dramatically 
smaller than the number of frequent s-patterns. The pattern 
loss is larger than 0.99. This means that the great major- 
ity of s-patterns has no frequent ti-forms, probably due to 
the spread of occurrences of s-patterns over numerous ti- 
patterns. These findings are in line with [22]. In addition, 
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6 = 0.1(= 11) | PrefizSpan I-PrefizSpan 
Number of patterns |SP| = 12,826,760 |PP| =51,025 - pLoss(SP) = 0.998 
Average 3.8 
Max. length 8 


Example of pattern s, frequEnt(s) 
A{Bp | (isform(s,p) A frequEnt(p))} 


8 = (€31, €29) 
supp(s) = 26 


ts1 = (e31lo0e29), supp(tsi) = 
ts2 = (e31 [1 e29), supp(ts2) = 
ts3 = (e31 Iz €29), supp(ts3) = es 
ts4 = (e31 Tz €29) )= 3 
ts5 = (e31 I4 €29), sup p(t S5 5) =4 
tsg = (e31 Ts €29), supp(ts¢) 7 

( 

( 


Example of pattern s, frequEnt(s) 8 = (€22, €33) pi = (€22I0e33), supp(pi) = 25 
A{A(pi, p5)\(frequEnt(pi) A frequEnt(p;))} supp(s) = 53 po = (e22he33), supp(p2) = 22 
Support loss | sLoss($P™T)= 0.33 ; std(sLoss(SP‘*))=0.10 


Table 2: Comparison of sets of patterns mined with PrefizSpan and I-PrefixSpan 


we can see in Line 2 that the average length of ti-patterns is 
about twice smaller than the length of s-patterns, the same 
for their maximal length. A first conclusion that can be 
drawn here is that most of frequent sequential patterns have 
no recurrences in their time-intervals. This means that stu- 
dents tend to have numerous recurrent sequential activities, 
and quite less recurrent time-interval activities. However, 
even though the average length of patterns is divided by 2, 
ti-patterns have a significant length, which means that they 
do represent a meaningful students’ activities. 

Moreover, a tens of thousands s-patterns (about 34,000) 
have one or more frequent ti-forms (about 51,000). This 
means that for these sequential activities, there are actu- 
ally temporal regularities. These activities will be studied 
in more detail in the following section. 


Lines 4 and 5 in Table 2 illustrate some examples of s- 
patterns and their candidate or frequent ti-forms. Line 4 
presents one of the 99.8% s-patterns that has no ti-form 
(thus, from SP°). This pattern (s = (e31,€29)) has 6 candi- 
date ti-forms, but none of them is frequent. We can conclude 
that no obvious time-interval regularity is observed for this 
activity. Thus, this activity does not seem to be guided by 
temporal constraints. We can also observe here that the sum 
of the support of the ti-patterns is greater than the support 
of their s-form s. This was mentioned in section 3.2.2. 


In the remaining sequential patterns (SP**) made up of 
about 34,000 s-patterns, 65% of the s-patterns have exactly 
1 frequent ti-form and 91% have 1 or 2 frequent ti-forms. 
The highest number of frequent ti-forms of an s-pattern is 
9, which is quite high. Let us now consider line 5 in Table 2, 
that presents a s-pattern that has several frequent ti-forms. 
This s-pattern (s = (e22,e33)) has a support equal to 53 
and two frequent ti-forms. Such a pattern occurs with two 
temporal recurrences, and most of its occurrences have a 
time gap between 1 minute and 1 day. Such patterns are 
highly interesting and will also be further studied. 


Based on these findings, it is legitimate to ask whether a ti- 
pattern-based model can replace a s-pattern-based model. 
Line 1 gives first indications. Many sequential patterns ”dis- 
appear” with such a model (more than 99% of sequential 
patterns have no frequent ti-pattern). If the objective is to 
replace traditional s-patterns by ti-patterns, a problem of 
coverage of the model arises. However, if the goal is to iden- 


tify which activities (sequential) have temporal regularities, 
ti-patterns are of the highest interest. 


Let us now focus on the support loss associated with the 
complete set SP'* of s-patterns that have at least one fre- 
quent ti-form. sLoss(SP'*) = 0.33, with a standard de- 
viation equal to 0.1. This means that on average 1/3 of 
the occurrences of an s-pattern ”disappear”, i.e. they do 
not belong to any frequent ti-form. We can conclude that 
among patterns with identified temporal regularities, 33% 
of the occurrences do not follow this regularity, which may 
be high. 


4.4 Evaluating the impact of time on the set of 


possible future activities of students 
Following our methodology, we evaluate now if ti-patterns 
carry more information than s-patterns about future activ- 
ities of students. As a preliminary remark, we would like 
to mention that fs,eLoss(s) < 0. We mentioned previously 
that this case would occur rarely, in practice here it does 
not occur. 


In the set S'*, 71% of the patterns have at least one ex- 
tension in SP (see Def. 3.5). Let us first consider the 66% 
of these patterns that have a unique extension. By defini- 
tion for these patterns, Ent(s) = 0 and Ent(p) > 0,Vp € 
ti-form(s). The first Line of Table 3 is an example of such 
a case. The s-pattern (e24 €27 €14) has only one extended 
part, so its entropy equals zero. It has three ti-forms, but 
only one has an extended part. So, all these ti-forms have 
an entropy equals to zero. 

In this case, even if the entropy loss is null, the information 
about the future activities of students is increased, as only 
one ti-pattern has a frequent extended part. 

Let us now consider the 34% remaining patterns, which have 
more than one extension in SP. The average entropy is 0.84 
with a maximal entropy of 7.71. When focusing on the set of 
their ti-forms, the average entropy is 0.35 and the maximal 
entropy is 6.22. To make entropies as comparable as possi- 
ble, the average entropy for s-patterns has been evaluated 
only on the set of s-patterns that have at least one ti-form. 
We can first notice that entropy of s-patterns is globally 
higher than the one of ti-patterns (for both maximal and 
average values). More precisely, the average entropy of s- 
patterns is 2.4 times bigger than the one of ti-patterns. We 
can thus draw a first global conclusion: managing time in 
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s-pattern | Examples of | Ent(s) Examples of nbExt(p) | Examples of | max- | mean- 
extPart(s) p € ti-form(s) extPart(p) | Ent(p) | Ent(p) 
I. I: 0 
(er ear e14)) — (e12) 08 ee | °a 0.0 | 0.0 
(e14a Is e12)) 
Goan (e1 I5 e10 Li e12) 0 
: (e1 I5 e10 Io €12) 1 (e12 I4 e3) 
g 5.49 4.55 0.78 
(e1 e10 e12) ae eta,aay) (e1 I5 e10 Iz e12) 13 (e12 Is €419,13,12}) 
eee, (e1 Iz €10 Ie e12) 24 (e12 I5 €¢19,13,15}) 


Table 3: Examples s-patterns with ti-forms, extended parts and entropy values. (*) The corresponding exten- 


sion pattern is p” = (e24 I3 e27 Iz e14 Is e12). 


patterns allows to decrease the uncertainty of students’ fu- 
ture activities. 

We will now compare the entropy of each s-pattern, with 
the entropy of its ti-forms (through eLoss). In 68% of the 
cases, the entropy loss between the s-patterns and their ti- 
forms is higher than 0. This means that when considering a 
temporal student activity, in 2 cases out of 3, the future ac- 
tivity of this student is less uncertain than when managing 
his/her sequential activity. These 68% are divided into 51% 
with a loss equal to 1, which means that future activities 
become certain. 17% of the cases have a loss between 0 and 
1. The average entropy loss on all s-patterns is quite high: 
eLoss = 0.4. Roughly speaking, the future activities of stu- 
dents are on average 40% less uncertain when managing time 
in patterns, which is highly promising. 


Thanks to these experiments, we confirm that managing 
time-interval patterns allows, in most cases, to have a better 
view of the following activities of students. In addition, for 
a significant number of activities, future activities are now 
totally certain. 


Let us now focus on an example presented in the second 
Line of Table 3. The s-pattern s = (e1e10€12) has many 
extensions in SP and many ti-forms, among which many 
of them have extensions. Notice that although the entropy 
loss is low (the maximal entropy of the ti-forms is 4.55), on 
average it is significantly lower (0.78). In this specific case, 
eLoss measure is not that representative of the difference 
in entropy, the entropy decrease is probably higher than the 
eLoss value. 


4.5 Evaluating the impact of a specific time- 


interval on students’ future activities 
The experiments conducted here fall within the scope of the 
last step of our methodology. They aim at evaluating to 
what extent two students who perform a similar activity 
(both in terms of resources and time-interval) and who only 
differ in their last time-interval, have the same future activ- 
ities. In the experiments conducted, we will only focus on 
patterns made up of at least 3 events (and 2 time-intervals) 
to ensure that the patterns can be considered as activities. 


In the set PQ composed of |PQ| = 9,510 of pseudo-equivalent 
pairs of patterns (cf., Definition 3.7), 25% of the extended 
parts of a pattern of any pair are also part of the extended 
parts of the other pattern (sequentially and temporally iden- 
tical). 11% additional pairs have sequentially identical ex- 
tended parts. This highlights that even when two ti-patterns 


differ in their last time-interval only, this small difference 
leads to a significant difference in their sets of extended 
parts. In terms of students’ activities, this means that when 
two students make exactly the same activity, except on the 
last time-interval, their following activities mainly differ: not 
only in terms of temporal activities but also in terms of their 
sequential activities. We can conclude that the last time- 
interval highly influences students’ future activities and that 
it may be viewed as an indicator of activities that are be- 
ginning to diverge. 


Experiments conducted in both previous sections confirm 
that ti-patterns contribute to the increase of the information 
about students’ future activities whereby the uncertainty of 
this future is reduced. As a consequence, we can say that 
time is an important information in students’ activities. 


4.6 Interpretation of ti-patterns 

In this section, we present examples of frequent ti-patterns, 
in an understandable format to better analyze and under- 
stand students’ activities. 

The events ids in patterns are replaced by their type and 
an id. Lec, will refer to the slides associated with the n‘” 
lecture; Glos, will be the n*” glossary resource; Stan a syn- 
tax resource; Sum» will be a summary resource ; Lab, a 
resource that contains exercises that are studied during lab 
sessions (exercise sheets); F'A, are facultative additional ex- 
ercises; finally Ad is the advise resource. The time-intervals 
are noted (Is, Imn,In,1a,Iw,Imt), which refer to seconds, 
minutes, hours, days, weeks and months. 

Given that the longer an activity, the more information it 
contains, we will preferably focus on the longest ti-patterns. 


Activities made up of temporally close events 

Let us start by studying activities that contain only the ”sec- 
onds” time-interval (i.e. events with a maximal gap of 1 
minute). This will allow us to have a better view of the type 
of activities that are performed on the spot. First, the cor- 
responding activities tend to be made up of specific types 
of events: they are a mix of glossary, syntax, advertisement 
and lab resources. Second, the maximum length here is 7, 
which means that there are actually long recurrent ”quick” 
activities made by students. Third, when analyzing the ac- 
tivities, we can remark that they all have a similar skeleton: 
students generally start by looking at the following resources 
(in any order): {Sums, Sta3, Glos3}, then study one or more 
Lab exercises and finally consult an advice page. Let us for 
example present a ti-pattern of length 7: 

(Sum3 Is Sta3 Is Glos3 Is Labi Is Lab I; Labs I; Ad) 
Such patterns can be interpreted as follows: they represent 
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typical activities performed when preparing an exam. Not 
only several Lab sheets are studied, but also before these re- 
sources, students have a quick look at the syntax, glossary, 
and summary of the lectures. They finally consult the ad- 
vice page. In such patterns, students interact with resources 
within a short time, including with the lab sheets. 
Activities made up of /;, time-intervals 

Let us now focus on patterns that use the “hours” time- 
interval, where patterns are made up of events with a gap 
value between 1 hour and 1 day. Here again, we identify a 
skeleton shared by most of the ti-patterns: 

(Labyi,3} In Labs2,3} In Labs2,3,44), where Labs;,3} means 
either Lab; or Labs. 

These patterns highlight that some students tend to work 
sequentially on several exercise sheets. The gap being be- 
tween 1 hour and 1 day, tends to mean that students dig 
deep into their works: they spend some hours to perform 
each exercise sheet. 

Other intervals 

We have performed a similar study on other time-intervals. 
For each of them, we also identify skeletons shared by almost 
all the patterns. 


An interesting conclusion that can be drawn from these find- 
ings is that for any given time-interval, typical long activities 
are made by students, that do all have the same skeleton. 
More importantly, when comparing skeletons between time- 
intervals, they are totally different. We can thus conclude 
that the type of activity performed is strongly linked with 
the rhythm” of the activity. Here, ’rhythm” means a time- 
interval granularity shared by all gap between all events of 
the activity. 

Last, when studying the timestamps associated with each 
occurrence of the activities presented above, there is no spe- 
cific period associated: they are performed at any moment in 
the semester. For example, when considering the first exam- 
ple given, that is mainly related to the 3" lecture, we found 
similar patterns for the 1°’, 2", etc. lecture resources. 


5. DISCUSSION 


While traditional studies emphasize that students have typ- 
ical sequential learning behaviors (identified by frequent se- 
quential patterns), this study further emphasizes that for 
specific activities students work with temporal regularities. 
Based on the experiments conducted in the previous sec- 
tions, we initiate a discussion. 


The results have highlighted that among the sets of inter- 
vals tested (linear and granular), the one that represents 
the human natural time is the most relevant one, at least 
for the dataset used in the experiments (see section 4.2). 
In addition to outperforming other sets of intervals accord- 
ing to predefined measures, this set conforms to the scope 
of application: the duration of most of the lectures or lab 
sessions is about one hour, two successive lectures tend to 
occur each week, etc. So, the interpretation of the discov- 
ered patterns is enhanced. Of course, many other sets of 
intervals remain untested and may be more adequate. Be- 
sides, an automatic approach that learns the optimal set of 
intervals could be tested, as in [24]. However, this would be 
at a significant additional computational cost, without any 
guarantee of applicative interpretability of these intervals. 


As expected, a high number of sequential patterns have no 
frequent ti-form. In the experiments conducted, we have 
even highlighted that most of the sequential activities have 
no temporal regularities. This results in a high number of 
“lost” patterns, which can be problematic, in case we are 
interested in both frequent ti-patterns and s-patterns. A 
solution could manage both types of patterns: sequential 
students’ activities mixed with temporal students’ activi- 
ties. This solution would not only maintain the coverage of 
the model, thanks to sequential patterns but also manage 
time, thanks to temporal patterns, when suitable. Here is 
an example of such a pattern: (2 E27 I; E13). This pattern 
means that many students consult E2 then E27 (whatever is 
the time-interval), then between 1 minute and 1 hour later 
they do consult E43. 


Focusing on s-patterns and their various frequent ti-forms 
can help to highlight different learning approaches adopted 
by students. For example, an activity done with a gap lower 
than 1 minute between its events may represent the fact 
that the associated students are used to first download all 
the resources and then work offline. The same activity with 
a time gap between 1 minute and 1 hour may reflect that 
students do work online, they do not access a resource before 
finishing the previous one. So, in addition to highlighting the 
diversity of activities of students, t¢-patterns are also a way 
to identify students’ learning practices. One can foresee that 
these patterns could be used as input information for many 
works such as those that focus on students’ engagement. 


6. CONCLUSION AND FUTURE WORKS 


The study presented in this paper highlights the relevance 
of using time information when mining patterns of students’ 
activities. A time-interval pattern mining approach, through 
the I-PrefizSpan state-of-the-art algorithm, has been adopted 
to conduct this study. 


The experiments conducted have pointed out that the nature 
of the set of intervals used highly impacts the representativ- 
ity of the model and that the set of intervals that represents 
the human natural time is adequate. We also found that 
most of the sequential students’ activities do not correspond 
to any time-interval activity. However, for other cases, man- 
aging this time-interval provides a better view of the future 
possible students’ activities, thanks to temporal indicators. 
Moreover, results show that a single time-interval difference 
between two events of two patterns sequentially equivalent 
results in significantly different subsequent activities. 

We thus confirm our hypothesis: temporal information is 
highly promising for a more precise modeling of students’ 
activities. One additional experiment has illustrated some 
frequent students’ activities both temporal and sequential. 
It has put forward that, by looking at some specific time- 
intervals, we can understand what activities students often 
perform instantly or throughout a longer period. 

The work we have conducted provides a first step towards 
longer-term research. One of our future goals is to provide 
students with recommendations of educational resources. By 
relying on ti-patterns, we are confident that not only the ac- 
curacy of the recommendations provided to students will be 
increased but also that these patterns will give indications 
about the right time to propose recommendations to stu- 
dents. 
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